智能论文笔记

Foveate, Attribute, and Rationalize: Towards Safe and Trustworthy AI

Alex Mei , Sharon Levy , William Yang Wang

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-19

Users' physical safety is an increasing concern as the market for intelligent systems continues to grow, where unconstrained systems may recommend users dangerous actions that can lead to serious injury. Covertly unsafe text, language that contains actionable physical harm, but requires further reasoning to identify such harm, is an area of particular interest, as such texts may arise from everyday scenarios and are challenging to detect as harmful. Qualifying the knowledge required to reason about the safety of various texts and providing human-interpretable rationales can shed light on the risk of systems to specific user groups, helping both stakeholders manage the risks of their systems and policymakers to provide concrete safeguards for consumer safety. We propose FARM, a novel framework that leverages external knowledge for trustworthy rationale generation in the context of safety. In particular, FARM foveates on missing knowledge in specific scenarios, retrieves this knowledge with attribution to trustworthy sources, and uses this to both classify the safety of the original text and generate human-interpretable rationales, combining critically important qualities for sensitive domains such as user safety. Furthermore, FARM obtains state-of-the-art results on the SafeText dataset, improving safety classification accuracy by 5.29 points.

translated by 谷歌翻译

Towards Generating Diverse Audio Captions via Adversarial Training

Xinhao Mei , Xubo Liu , Jianyuan Sun , Mark D. Plumbley , Wenwu Wang

分类：人工智能

2022-12-05

Automated audio captioning is a cross-modal translation task for describing the content of audio clips with natural language sentences. This task has attracted increasing attention and substantial progress has been made in recent years. Captions generated by existing models are generally faithful to the content of audio clips, however, these machine-generated captions are often deterministic (e.g., generating a fixed caption for a given audio clip), simple (e.g., using common words and simple grammar), and generic (e.g., generating the same caption for similar audio clips). When people are asked to describe the content of an audio clip, different people tend to focus on different sound events and describe an audio clip diversely from various aspects using distinct words and grammar. We believe that an audio captioning system should have the ability to generate diverse captions, either for a fixed audio clip, or across similar audio clips. To this end, we propose an adversarial training framework based on a conditional generative adversarial network (C-GAN) to improve diversity of audio captioning systems. A caption generator and two hybrid discriminators compete and are learned jointly, where the caption generator can be any standard encoder-decoder captioning model used to generate captions, and the hybrid discriminators assess the generated captions from different criteria, such as their naturalness and semantics. We conduct experiments on the Clotho dataset. The results show that our proposed model can generate captions with better diversity as compared to state-of-the-art methods.

translated by 谷歌翻译

SNAF: Sparse-view CBCT Reconstruction with Neural Attenuation Fields

Yu Fang , Lanzhuju Mei , Changjian Li , Yuan Liu , Wenping Wang , Zhiming Cui , Dinggang Shen

分类：计算机视觉

2022-11-30

Cone beam computed tomography (CBCT) has been widely used in clinical practice, especially in dental clinics, while the radiation dose of X-rays when capturing has been a long concern in CBCT imaging. Several research works have been proposed to reconstruct high-quality CBCT images from sparse-view 2D projections, but the current state-of-the-arts suffer from artifacts and the lack of fine details. In this paper, we propose SNAF for sparse-view CBCT reconstruction by learning the neural attenuation fields, where we have invented a novel view augmentation strategy to overcome the challenges introduced by insufficient data from sparse input views. Our approach achieves superior performance in terms of high reconstruction quality (30+ PSNR) with only 20 input views (25 times fewer than clinical collections), which outperforms the state-of-the-arts. We have further conducted comprehensive experiments and ablation analysis to validate the effectiveness of our approach.

translated by 谷歌翻译

Out-of-Distribution Detection with Hilbert-Schmidt Independence Optimization

Jingyang Lin , Yu Wang , Qi Cai , Yingwei Pan , Ting Yao , Hongyang Chao , Tao Mei

分类：机器学习 | 计算机视觉

2022-09-26

异常检测任务在AI安全中起着至关重要的作用。处理这项任务存在巨大的挑战。观察结果表明，深度神经网络分类器通常倾向于以高信心将分布（OOD）输入分为分配类别。现有的工作试图通过在培训期间向分类器暴露于分类器时明确对分类器施加不确定性来解决问题。在本文中，我们提出了一种替代概率范式，该范式实际上对OOD检测任务既有用，又可行。特别是，我们在培训过程中施加了近距离和离群数据之间的统计独立性，以确保inlier数据在培训期间向深度估计器显示有关OOD数据的信息很少。具体而言，我们通过Hilbert-Schmidt独立标准（HSIC）估算了Inlier和离群数据之间的统计依赖性，并在培训期间对此类度量进行了惩罚。我们还将方法与推理期间的新型统计测试相关联，加上我们的原则动机。经验结果表明，我们的方法对各种基准测试的OOD检测是有效且可靠的。与SOTA模型相比，我们的方法在FPR95，AUROC和AUPR指标方面取得了重大改进。代码可用：\ url {https://github.com/jylins/hone}。

translated by 谷歌翻译

Lightweight Image Codec via Multi-Grid Multi-Block-Size Vector Quantization (MGBVQ)

Yifan Wang , Zhanxuan Mei , Ioannis Katsavounidis , C. -C. Jay Kuo

分类：计算机视觉

2022-09-25

提出了一种多网格多块大小矢量量化（MGBVQ）方法，用于在这项工作中进行图像编码。图像编码的基本概念是在量化和熵编码之前删除像素之间的相关性，例如，由现代图像编码标准采用的离散余弦变换（DCT）和内部预测。我们提出了一种删除像素相关性的新方法。首先，通过将相关性分解为长期和短距离相关性，我们由于其平滑度而表示较粗的网格中的远距离相关性，从而导致多元格里德（MG）编码体系结构。其次，我们表明可以通过一组矢量量化器（VQS）有效地编码短程相关性。沿着这条线，我们争论了非常大的块大小的VQ的有效性，并提出了一种实施它们的便捷方法。通过实验结果表明，MGBVQ提供了出色的速率 - 持续性能（RD）性能，与现有的图像编码器相当，复杂性较低。此外，它提供了渐进式编码的Bitstream。

translated by 谷歌翻译

Multi-level Adversarial Spatio-temporal Learning for Footstep Pressure based FoG Detection

Kun Hu , Shaohui Mei , Wei Wang , Kaylena A. Ehgoetz Martens , Liang Wang , Simon J. G. Lewis , David D. Feng , Zhiyong Wang

分类：计算机视觉 | 人工智能

2022-09-22

步态冻结（FOG）是帕金森氏病的最常见症状之一，这是中枢神经系统的神经退行性疾病，影响了世界各地数百万的人。为了满足提高雾的治疗质量的紧迫需求，设计雾计算机辅助检测和量化工具的需求越来越重要。作为一种用于收集运动模式的非侵入性技术，从压力敏感步态垫中获得的脚步压力序列为评估诊所和家庭环境中的雾气提供了绝佳的机会。在这项研究中，提出了雾检测为一项顺序建模任务，并提出了一种新颖的深度学习结构，即对对抗性时空网络（ASTN），提出了跨多个级别的雾模式。引入了一种新型的对抗训练方案，并具有多级主题鉴别器，以获得独立的雾代表示，这有助于降低由于高主体间方差而导致的过度拟合风险。结果，对于看不见的受试者，可以实现强大的雾检测。拟议的计划还阐明了从其他场景中改善主题级临床研究，因为它可以与许多现有的深层建筑集成在一起。据我们所知，这是基于脚步压力的雾检测的最早研究之一，利用ASTN的方法是追求独立于主题的表示形式的第一个深神经网络架构。从21名受试者收集的393次试验的实验结果表明，AUC 0.85的雾检测提出的ASTN表现令人鼓舞。

translated by 谷歌翻译

Efficient Speed Planning for Autonomous Driving in Dynamic Environment with Interaction Point Model

Yingbing Chen , Ren Xin , Jie Cheng , Qingwen Zhang , Xiaodong Mei , Ming Liu , Lujia Wang

分类：机器人

2022-09-19

安全与其他交通参与者的互动是自动驾驶的核心要求之一，尤其是在交叉点和遮挡中。大多数现有的方法都是为特定场景设计的，需要大量的人工劳动参数调整，以应用于不同情况。为了解决这个问题，我们首先提出了一个基于学习的交互点模型（IPM），该模型描述了代理与保护时间和交互优先级之间的相互作用以统一的方式。我们将提出的IPM进一步整合到一个新颖的计划框架中，通过在高度动态的环境中的全面模拟来证明其有效性和鲁棒性。

translated by 谷歌翻译

Progressive Glass Segmentation

Letian Yu , Haiyang Mei , Wen Dong , Ziqi Wei , Li Zhu , Yuxin Wang , Xin Yang

分类：计算机视觉

2022-09-06

玻璃在现实世界中非常普遍。受玻璃区域的不确定性以及玻璃背后的各种复杂场景的影响，玻璃的存在对许多计算机视觉任务构成了严重的挑战，从而使玻璃分割成为重要的计算机视觉任务。玻璃没有自己的视觉外观，而只能传输/反映其周围环境的外观，从而与其他常见对象根本不同。为了解决此类具有挑战性的任务，现有方法通常会探索并结合深网络中不同特征级别的有用线索。由于存在级别不同的特征之间的特征差距，即，深层特征嵌入了更多高级语义，并且更好地定位目标对象，而浅层特征具有更大的空间尺寸，并保持更丰富，更详细的低级信息，因此，将这些特征融合到天真的融合将导致亚最佳溶液。在本文中，我们将有效的特征融合到两个步骤中，以朝着精确的玻璃分割。首先，我们试图通过开发可区分性增强（DE）模块来弥合不同级别特征之间的特征差距，该模块使特定于级别的特征成为更具歧视性的表示，从而减轻了融合不兼容的特征。其次，我们设计了一个基于焦点和探索的融合（FEBF）模块，以通过突出显示常见并探索级别差异特征之间的差异，从而在融合过程中丰富挖掘有用的信息。

translated by 谷歌翻译

Accurate and Robust Lesion RECIST Diameter Prediction and Segmentation with Transformers

Youbao Tang , Ning Zhang , Yirui Wang , Shenghua He , Mei Han , Jing Xiao , Ruei-Sung Lin

分类：计算机视觉

2022-08-28

通过恢复（实体瘤的响应评估标准）自动测量病变/肿瘤大小，直径和分割对于计算机辅助诊断很重要。尽管近年来已经研究了它，但仍有空间可以提高其准确性和鲁棒性，例如（1）通过合并丰富的上下文信息来增强功能，同时保持高空间分辨率，（2）涉及新任务和损失以进行关节优化。为了实现这一目标，本文提出了一个基于变压器的网络（Meaformer，测量变压器），用于病变恢复直径预测和分割（LRDPS）。它被配制为三个相关和互补任务：病变分割，热图预测和关键点回归。据我们所知，这是首次使用按键重点回归进行恢复直径预测。 MeaeFormer可以通过使用变压器来捕获其远程依赖性来增强高分辨率功能。引入了两个一致性损失，以明确建立这些任务之间的关系，以更好地优化。实验表明，MeAformer实现了LRDP在大规模深层数据集上的最新性能，并在纵向研究中产生了两个下游诊所的任务，即3D病变细分和恢复评估。

translated by 谷歌翻译

Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

Pengcheng He , Baolin Peng , Liyang Lu , Song Wang , Jie Mei , Yang Liu , Ruochen Xu , Hany Hassan Awadalla , Yu Shi , Chenguang Zhu

分类：自然语言处理 | 人工智能

2022-08-21

本文介绍了Z-Code ++，这是一种针对抽象文本摘要优化的新的预训练的语言模型。该模型使用三种技术扩展了艺术编码器模型的状态。首先，我们使用两阶段的预训练过程来改善模型在低资源摘要任务上的性能。该模型首先是使用文本语料库进行语言理解的预先培训的，然后在汇总语料库中不断预先培训，以进行基础文本生成。其次，我们用分离的注意力层代替编码器中的自我发项层，其中每个单词都使用两个向量分别代表其内容和位置。第三，我们使用融合编码器，这是一种以层次方式编码长序列的简单而有效的方法。 Z-Code ++在13个文本摘要任务中的9个跨5种语言中创建了新的艺术状态。我们的模型的参数有效，因为它的表现优于XSUM上600倍较大的Palm-540b，并且在Samsum上的易经的200倍GPT3-175B较大。在零射击和少量设置中，我们的模型大大优于竞争模型。

translated by 谷歌翻译